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Abstract 

In this paper, we study the optimal training and data transmission strategies for block fading 
multiple-input multiple-output (MIMO) systems with feedback. We consider both the channel gain 
feedback (CGF) system and the channel covariance feedback (CCF) system. Using an accurate 
capacity lower bound as a figure of merit, we investigate the optimization problems on the temporal 
power allocation to training and data transmission as well as the training length. For CGF systems 
without feedback delay, we prove that the optimal solutions coincide with those for non-feedback 
systems. Moreover, we show that these solutions stay nearly optimal even in the presence of feedback 
delay. This finding is important for practical MIMO training design. For CCF systems, the optimal 
training length can be less than the number of transmit antennas, which is verified through numerical 
analysis. Taking this fact into account, we propose a simple yet near optimal transmission strategy 
for CCF systems, and derive the optimal temporal power allocation over pilot and data transmission. 
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I. Introduction 

A. Background and Motivation 

The study of multiple-input multiple-output (MIMO) communication systems can be broadly cate- 
gorized based on the availability and accuracy of channel state information (CSI) at the receiver or the 
transmitter sides. Under the perfect CSI assumption at the receiver, the MIMO channel information 
capacity and data transmission strategies often have elegantly simple forms and many classical results 
exist in the literature [1,2]. From [2-8] we know that the MIMO information capacity with perfect 
receiver CSI can be further increased if some form of CSI is fed back to the transmitter. The transmitter 
CSI can be in the form of causal channel gain feedback (CGF) or channel covariance feedback (CCF). 

In practical communication systems with coherent detection, however, the state of the MIMO 
channel needs to be estimated at the receiver and hence, the receiver CSI is never perfect due to 
noise and time variations in the fading channel. Taking the channel estimation error into account, a 
widely-used capacity lower bound was formulated in [9, 10] for independent and identically distributed 
(i.i.d.) MIMO channels, and the optimal data transmission for CGF systems was studied in [10]. 

Pilot-symbol-assisted modulation (PSAM) has been used in many practical communication systems, 
e.g., in Global System for Mobile Communications (GSM) [11]. In PSAM schemes, pilot (or training) 
symbols are inserted into data blocks periodically to facilitate channel estimation at the receiver [12]. 
It is noted that pilot symbols are not information-bearing signals. Therefore, an important design 
aspect of communication systems is the optimal allocation of resources (such as power and time) to 
pilot symbols that results in the best tradeoff between the quality of channel estimation and rate of 
information transfer. Three pilot parameters under a system designer's control are: 1) spatial structure 
of pilot symbols, 2) temporal power allocation to pilot and data, and 3) the number of pilot symbols 
or simply training length. 

The optimal pilot design has been studied from an information-theoretic viewpoint for non-feedback 
multi-antenna systems of practical interest [9, 13, 14]. For non-feedback MIMO systems with i.i.d. 
channels, the authors in [9] provided optimal solutions for all the aforementioned design parameters 



by maximizing the derived capacity lower bound. For CCF systems with correlated MIMO channels, 
the optimal solution for the pilot's spatial structure was investigated in [15-17]. However, optimal 
solutions for the temporal pilot power allocation and training length are generally unknown for MIMO 
systems with any form of feedback. Some results were reported in [18] for rank-deficient channel 
covariance matrix known at the transmitter, which are based on a relaxed capacity lower bound. 
However, this relaxed capacity bound is generally loose for moderately to highly correlated channels, 
which can render the provided solutions suboptimal. 

B. Approach and Contributions 

In this paper, we are concerned with the optimal design of pilot parameters for MIMO systems with 
various forms of feedback at the transmitter. Our main design objectives are the optimal temporal 
power allocation to pilot and data symbols, as well as the optimal training length that maximize 
the rate of information transfer in the channel. Our figure of merit is a lower bound on the ergodic 
capacity of MIMO systems, which is an extension of those derived in [10] from i.i.d channels to 
correlated channels. 

We address practical design questions such as: Are the simple solutions provided in [9] for non- 
feedback MIMO systems also optimal for systems with feedback? In CGF systems, feedback delay 
is unavoidable. If the CGF takes d symbol periods to arrive at the transmitter, the transmitter can 
only utilize this information after the first d symbol periods. In this case, we would like to know 
whether the optimal pilot design is significantly affected by the feedback delay. Furthermore, for 
CCF systems with correlated channels, the optimal training length may be shorter than the number 
of transmit antennas, which is generally difficult to solve analytically. In this case, we would like to 
know whether a near-optimal, yet simple pilot and data transmission strategy exists. 

In this context, the main contributions of this paper are summarized as follows. 

• For delayless CGF with i.i.d. channels, we show that the solutions to the optimal temporal power 
allocation to pilot and data transmission as well as the optimal training length coincide with the 
solutions for non-feedback systems. 



• For delayed CGF systems with i.i.d. channels, our numerical results show that evenly distributing 
the power over the entire data transmission (regardless of the delay time) gives near optimal 
performance at practical signal-to-noise ratio (SNR). As a result, the solutions to the optimal 
temporal power allocation to pilot and data transmission, as well as the optimal training length 
for the delayless system stay nearly optimal regardless of the delay time. 

• For CCF systems with correlated channels, we propose a simple transmission scheme, taking into 
account the fact that training length L p can be less than the number of transmit antennas. This 
scheme only requires numerical optimization of L p and does not require numerical optimization 
over the spatial or temporal power allocation over pilot and data transmission. Our numerical 
results show that this scheme is very close to optimal. In addition, our results show that optimizing 
L p can result in a significant capacity improvement for correlated channels. 

• Using the proposed scheme for CCF systems, we find the solution to the optimal temporal 
power allocation to pilot and data transmission, which does not depend on the channel spatial 
correlation under a mild condition on block length or SNR. Therefore, the proposed transmission 
and power allocation schemes for CCF systems give near optimal performance while having very 
low computational complexity. 

The rest of the paper is organized as follows. The PSAM transmission scheme, channel estimation 
method, as well as an accurate capacity lower bound for spatially correlated channels are presented 
in Section [Til The optimal transmission and power allocation strategy for non-feedback systems are 
summarized in Section JII] The optimal transmission and power allocation strategy for CGF and CCF 
systems are studied in Section Hvl and Section [V] respectively. Finally, the main contributions of this 
paper are summarized in Section |VT] 

Throughout the paper, the following notations will be used: Boldface upper and lower cases denote 
matrices and column vectors, respectively. The matrix Jjy is the N x N identity matrix. [•]* denotes 
the complex conjugate operation, and [•]' denotes the complex conjugate transpose operation. The 
notation E{-} denotes the mathematical expectation. tr{-}, | • | and rank{-} denote the matrix trace, 
determinant and rank, respectively. 
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Fig. 1. An example of a transmission block of L symbols in a system with delayed feedback. It consists of a training sub- 
block, followed by two data sub-blocks. Temporal power allocations are shown at the top and the length of each sub-block 
is shown at the bottom. 



II. System Model 

We consider a MIMO block-fiat-fading channel model with input-output relationship given by 

y = Hx + n, (1) 

where y is the N r x 1 received symbol vector, x is the JV ( x 1 transmitted symbol vector, H is 
the N r x N t channel gain matrix, and n is the JV r x 1 noise vector having zero-mean circularly 
symmetric complex Gaussian (ZMCSCG) entries with variance cr^. Without loss in generality, we 
let o\ = 1. The entries of H are also ZMCSCG with unit variance. We consider spatial correlations 

1/2 

among the transmit antennas only. Therefore, H = HqR h , where Hq has i.i.d. ZMCSCG entries 
with unit variance. The spatial correlation at the transmitter is characterized by the covariance matrix 
R H = E{H^H}/N r . In the case where the channels are spatially independent, we have Rh = 
JjV t . We assume that Rh is a positive definite matrix and denote the eigenvalues of Rh by = 
[di 92 ■ ■ ■ 9N t ] T - Furthermore, we use the concept of majorization to characterize the degree of 
channel spatial correlation [19,20], which is summarized in Appendix H 

A. Transmission Scheme 

Fig. Q] shows an example of a transmission block of L symbol periods in a PSAM scheme. The 
channel gains remain constant over one block and change to independent realizations in the next 



block. During each transmission block, each transmit antenna sends L p pilot symbols, followed by 
Ld (= L — L p ) data symbols as shown in Fig. Q] The receiver performs channel estimation during the 
pilot transmission. For CGF systems, the receiver feeds the channel estimates back to the transmitter 
once per block to allow adaptive data transmission in the form of power control. In practical scenarios, 
there is a time delay of d symbol periods before the transmitter receives the feedback information as 
shown in Fig. Q] That is, the data transmission during the first d symbol periods is not adaptive to 
the channel, and adaptive transmission is only available for the remaining L d — d symbol periods. We 
define (3 = d/L d as the feedback delay factor. For CCF systems, less frequent feedback is required as 
the channel correlation changes much slower than the channel gains. Therefore, we do not consider 
feedback delay, i.e., d = 0. Note that for non-feedback systems, d = L d . 

The total transmission energy per block is given by VLT S as shown in Fig. \T\ where V is the 
average power per transmission and T s is the symbol duration. We define the PSAM power factor 
as the ratio of the total energy allocated to the data transmission, denoted by a. We also denote the 

respectively. Therefore, we have the 
following relationships. 



power or SNR per pilot and data transmission by V v and VdZ< 



VL VL 
VLT S = V P L P T S + V d L d T s , V p = (l-a)—, and V d = a—. (2) 

For feedback systems with delay of d symbol periods, the total energy for data transmission V d L d T s 
is further divided into the non-adaptive data transmission sub-block and the adaptive data transmission 
sub-block as shown in Fig. [TJ We define the data power division factor as the ratio of the total data 
energy allocated to the non-adaptive sub-block, denoted by <ft. Therefore, we have the following 
relationships. 

V d L d T s = V dtl dT s + V d>2 (L d - d)T s , Vd,i = ^V d , and V d>2 = \^P d , (3) 

where Vd,i and Vd,2 are the power per transmission during the non-adaptive and adaptive sub-blocks. 

'ideally for CGF systems, Vd should be larger for the transmission blocks over which the channel is strong and smaller 
for blocks over which the channel is weak. However, the results in [10] suggest that this temporal data power adaptation 
provides little capacity gain, hence it is not considered in this paper. 



B. Channel Estimation 

In each transmission block, the receiver performs channel estimation during the pilot transmission. 
Combining the first L p received symbol vectors in a N r x L p matrix, we have 

Y = HX P + N, (4) 

where X p is the N t x L p pilot matrix and TV is the N r x L p noise matrix. 

Assuming the channel spatial correlation can be accurately measured at the receiver, the channel 
gain H can be estimated using the linear minimum mean square error (LMMSE) estimator [21]. We 

1/9 ~ ~ 1/9 

denote the channel estimate and estimation error as H = HqRI^ and H = HqR^ respectively, 

/ / / / 

where Hq and Hq have i.i.d. ZMCSCG entries with unit variance. H is given as [16] 

H = Y(X p iR H X p + I Lp y 1 XjR H . (5) 
The covariance matrix of the estimation error is given by [16] 

R k = E{H ] H}/N r = (R^ + X p X p i)-\ (6) 
From the orthogonality property of LMMSE estimator, we have 

Rjj = E{H ] H}/N r = R H - Rh (7) 

C. Ergodic Capacity Bounds 

The exact capacity expression under imperfect receiver CSI is still unavailable. We consider a lower 
bound on the ergodic capacity for systems using LMMSE channel estimation [9, 10]. In particular, 
the authors in [10] derived a lower bound and an upper bound for spatially i.i.d. channels. Here we 
extend these results to spatially correlated channels as follows. 

A lower bound on the ergodic capacity per channel use is given by [10] 

C LB = E A {log 2 \l Nf + H\l Nr + V ilx )- 1 HQ\}, (8) 
where Q = E{xx^} is the input covariance matrix, and 

^H X = E{Hxx^H ] } = E{H G R 1 ^xx\R 1 ff) ] Hl} 1 
= E{ triR^xx^R^^lN^triRfjQ}!^, 



where we have used E{HqZHq} = E{tx{Z}}I n t , given that Hq has i.i.d. entries with unit variance 
and is independent of Z. Therefore, the ergodic capacity lower bound per channel use in ([8]) can be 
rewritten as 



Clb = E A {log 2 \l Nt + {l + iv{R il Q}Y l H ] HQ^. 
An upper bound on the ergodic capacity per channel use is given by [10] 

Cub = Efj{ log 2 vreS yt ^ } - E x { log 2 ^(S^^ + IjvJ }, 



(9) 



where 
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Therefore, the ergodic capacity upper bound per channel use can be written as 

Cub = Efj[ log 2 \l Nt + (1 + ^{RfjQ})^^ HQ\] + N r E x [ log 2 L±^^^ j, 

= C LB + C gap , (10) 

where C gap is the difference between the upper bound and the lower bound, which indicates the 
maximum error of the bounds. The authors in [10] studied the tightness of the bounds for i.i.d. 
channels. They observed that C gap /CLB is negligible for Gaussian inputs, hence the bounds are tight. 
We find that this is also true for spatially correlated channels with LMMSE estimation. Therefore, the 
capacity lower bound per channel use in (J9]) is accurate enough to be used in our analysis assuming 
Gaussian inputs. The average capacity lower bound per transmission block is therefore given by 



Clb = -jrC LB = ~^ e h{ 1o S2 



rtr-r 



}■ 



(11) 



I Nt + {l + tr{RfjQ}) H'HQ 

In this paper, the average capacity lower bound in (fTTT) will be used as the figure of merit. We will 
use "capacity lower bound" and "capacity" interchangeably throughout the rest of this paper. 



III. Non-feedback Systems 

A. Spatially i.i.d. Channels 

The optimal pilot and data transmission scheme and optimal power allocation for non-feedback 
systems with spatially i.i.d. channels were studied in [2,9], and their main results are summarized 
as follows. The optimal transmission strategy is to transmit orthogonal pilots and independent data 
among the transmit antennas with spatially equal power allocation to each antenna during both pilot 
and data transmission. The optimal PSAM power factor a* is given by 

7 - VTvT - 1). for L d > N t 
a * = \ \, for L d = N t (12) 

k 7 + \/7(7- 1), for L d < N t 
where 7 = p^f^^ro ■ With the optimal a, the optimal training length is L* = N t . For equal power 
allocation to pilot and data, i.e., V p = Vd = V, L* should be found numerically. 

B. Spatially Correlated Channels 

In non-feedback systems where the transmitter does not know the channel correlation, it is difficult 
to find the optimal resource allocation and transmission strategies. Consequently, no results have been 
found on the optimal or suboptimal solution to a* and L*. Intuitively, the amount of training resource 
required should reduce as the channels becomes more spatially correlated. Therefore, one may use the 
solution to a* and L* for i.i.d. channels as a robust strategy for correlated channels in non-feedback 
systems. Similarly, one may still use the optimal transmission strategies for i.i.d. channels to ensure 
a robust system performance for correlated channels, which can be justified by the following two 
theorems. 

Theorem 1: For non-feedback systems with spatially correlated channels in PSAM schemes, the 
transmission of orthogonal training sequences among the transmit antennas with spatially equal 
power allocation minimizes the channel estimation errors for the least-favourable channel correlation, 
i.e., using X p X p " = ^ p Jjy t is a robust training scheme. 

Proof: see Appendix JI] 



Theorem 2: For non-feedback systems with spatially correlated channels in PSAM schemes, the 
transmission of Ltd. data sequences among the transmit antennas with spatially equal power allo- 
cation, i.e., Q = j^Ijj t , (a) maximizes the capacity for the least-favourable channel correlation at 
sufficiently low SNR, and (b) is the optimal transmission scheme at sufficiently high SNR. 

Proof: see [22]. 

Remark: From Theorem [7] and Theorem |2l we see that the optimal transmission strategy for i.i.d. 
channels is also a robust choice for correlated channels in non-feedback systems. 

IV. Channel Gain Feedback (CGF) Systems 

In this section, we consider systems having a noiseless feedback link from the receiver to the 
transmitter {e.g., a low rate feedback channel). After the receiver performs pilot-assisted channel 
estimation, it feeds the channel estimates back to the transmitter. Once the transmitter receives the 
estimated channel gains, it performs spatial power adaptation accordingly. We consider the channels 
to be spatially i.i.do. Since the data transmission utilizes all the channels with equal probability, it is 
reasonable to have at least as many measurements as the number of channels for channel estimation, 
which implies that L p > N t . From [9], we know that the optimal training consists of orthogonal 
pilots with equal power allocated to each antenna. 

A. CGF System with No Feedback Delay 

Firstly, we study an ideal scenario in which the transmitter receives the estimated channel gains 
at the start of the data transmission, i.e.,d = 0. For given Va, the ergodic capacity lower bound per 
channel use in (J9]) can be rewritten as 

Clb = E tio [ log 2 \l Nt + - -^g — hIh q\}, 

N a 2 
= M? 1 ° & ( 1+ l+^ Ai ®)}' (13) 



2 We will provide some discussion for CGF system with correlated channels in Section IV-EI 



where a 2 - = (l + ^jk 2 -] , cr 2 ~ = 1 — cr 2 ~, and A = [Ai A2 ... AatJ T denote the eigenvalues 



of HqHq. It was shown in [10] that the capacity is maximized when the matrix Q has the same 

eigenvectors as HqHq. The eigenvalues of Q can be found via the standard water-filling given by 

trL x -n 



H 



-A, 



- 



with ^ qi = V d , (14) 

i=i 

where 77 represents the water level, and [z] + = max{z, 0}. We refer to the number of non-zero qi as 
the number of active eigen-channels, denoting this number by m. Therefore, (fT3l) can be reduced to 

C LB = ^{E lo g 2 (T-^-^)}> (15) 



•1 + ^ 

i=l H a i=l i=l 

where (fT6l ) is obtained by substituting 77 from (fT4l into (fT5T ). It should be noted that in (031 ) and 

(f76l ) is the expectation over the m largest values in A. 

Using (fT6l ). we now look for optimal value of Vd- The following two theorems summarize the 
results on the optimal PSAM power factor a* as well as the optimal training length L*. 

Theorem 3: For delayless CGF systems with i.i.d. channels in PSAM schemes, the optimal PSAM 
power factor a* is given by (E3- 

Proof: see Appendix HITT 

Theorem 4: For delayless CGF systems with i.i.d. channels in PSAM schemes adopting the optimal 
PSAM power factor a*, the optimal training length equals the number of transmit antennas, that is 

l; = N t . 

Proof: see Appendix JV] 

Remark: Theorem \3\ and Theorem |?] show that the optimal pilot design for delayless CGF systems 
coincide with that for non-feedback systems in Section MI- A I That is to say, one can use the same 
design to achieve optimal performance in both non-feedback and CGF systems. 



B. CGF System with Feedback Delay 

For practical systems, a finite duration of d symbol periods is required before feedback comes 
into effect at the transmitter as shown in Fig. [TJ Therefore, the transmitter has no knowledge about 



the channel during the first data sub-block of d transmissions, which is equivalent to non-feedback 
systems. From [2], we know that the transmitter should allocate equal power to each transmit antenna 
during the first data sub-block (or the non-adaptive sub-block). After receiving the estimated channel 
gains, the transmitter performs spatial power water-filling similar to Section IIV-BI during the second 
data sub-block (or the adaptive sub-block) of length — d. Note that a CGF system with d = 
is equivalent to a non-feedback system. 

In order to optimize PSAM power factor a, we apply a two-stage optimization approach. Firstly, we 
optimize the data power division factor 4> for a given total data power constraint. Then, we optimize 
the PSAM power factor a. 

In general, we find that there is no closed-form solution for the optimal data power division factor 
4>* . Furthermore when the channel estimation error is large, the capacity lower bound is not globally 
concave on ^ G [0, 1]. Nevertheless, the block length L of CGF systems is usually large (which will 
be discussed further at the end of Section Hvl) . From the results on the optimal PSAM power factor a* 
and optimal training length L* in Section ITlI-AI and Section ITV-B I we also expect that V p S> V when 
L»l. This implies that the channel estimation errors in CGF systems are often small. Therefore, 
we can investigate the optimal data power division assuming perfect channel estimation to obtain 
some insights into the optimal solution for imperfect channel estimation. In the following, we will 
see that a good approximation of the optimal solution is given by <$* « [5 for practical SNR values 
under perfect channel estimation. 

From (O we see that less power per transmission is allocated to the non-adaptive sub-block 
(i.e.,Vdi < Vd,2) if 4> < (3, and vice versa. The average capacity lower bound for data transmission 
with perfect channel knowledge (i.e., no training) is given by 




(17) 



where the water-filling solution for qi with water level v is given by 
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Fig. 2. The optimal data power division factor <j>* vs. data transmission SNR Vd for different values of the delay factor 
(3 and antenna sizes. Perfect channel estimation is assumed. 

It can be shown that Clb in (1 1 Vb is concave on <f> G [0, l]o Using the Karush-Kuhn-Tucker (KKT) 
conditions [23], the optimal data power division factor 0* can be found as 

0, if E x {Xi] < Exiv- 1 } 



(19) 



are 



Note that the entries in A are the eigenvalues of a Wishart matrix with parameter (N t , N r ) [2]. 

Fig. [2] shows the optimal data power division factor <p* given by ( fT9l ) versus data transmission 
SNR Prf for different delay factors f3 and antenna sizes assuming perfect channel estimation. It can 
be seen that cf>* quickly increases from to /3 at very low SNR. For moderate to high SNR, <p* stays 
above j3 and converges to as Vd —>■ ooa More importantly, we see that <ft* is close to (3 at practical 

3 This can be shown from the first and second derivative of Clb w.r.t.,</> for any fixed number of active eigen-channels 
m. In particular, one can show that ^ is continuous on <f> G [0, 1] and d d ^ B < for any fixed m. Combining these two 
facts, one can conclude that Clb is concave on G [0, 1]. The detailed derivation is omitted for brevity. 

4 (f)* for the (Nt = 4, N r = 2) system starts to converge back to f3 at a higher SNR, which is not shown in Fig. [2] This is 
because the use of spatial water-filling in data transmission gives a significant improvement in the capacity when Nt > N r . 



SNR range, e.g.,Vd > dB. Therefore, we conclude that <j) = (5 is a near optimal solution. From 
((3]) we see that <f> = j3 is actually the simplest solution which allocates the same amount of power 
during each data transmission in both non-adaptive and adaptive sub-blocks, i.e.,Vdi = Vdi = 'Pd- 
Furthermore, this simple solution does not require the knowledge of the feedback delay time. 

Having (f>* « [3 for perfect channel estimation, we argue that (f>* (3 still holds for imperfect 
channel estimation and will verify its optimality using numerical results. This choice of <p leads to 
a simple solution for the optimal PS AM power factor a*, as well as the optimal training length L* 
for delayed CGF system summarized in Corollary [7] which can be shown by combining the results 
in Theorem \3\ Theorem |?] and those for the non-feedback systems summarized in Section IIII-AI 

Corollary 1: For delayed CGF systems with i.i.d. channels in PS AM schemes, temporally dis- 
tributing equal power per transmission over both the non-adaptive and adaptive data sub-blocks is a 
simple and efficient strategy, i.e., <j> = J3. With this strategy, the optimal PSAM power factor a* and 
the optimal training length L* coincide with those in the delay less case given in Theorem \3\ and E] 

C. Numerical Results 

Now, we present numerical results to illustrate the capacity gain from optimizing the PSAM power 
factor. The numerical results also validate the optimality of the transmission strategy in Corollary [7] 

Fig. [3] shows the average capacity lower bound Clb in (fTTb versus SNR V for delayless CGF 
systems (i.e.,d = 0) with i.i.d. channels and different antenna sizes. The solid lines indicate systems 
using a* and L* (L* = 4 in this case). The dashed lines indicate systems using equal temporal 
power allocation and L* found numerically. Comparing the solid and dashed lines, we see that the 
capacity gain from optimal temporal power allocation is approximately 9% at dB and 6% at 20 
dB for all three systems. This range of capacity gain (5% to 10%) was also observed in [9] for 
non-feedback systems which can be viewed as an extreme case of delayed CGF system with d = Ld- 
From the results for the extreme cases, i.e.,d = and d = Ld, we conclude that the capacity gain 
from optimizing the PSAM power factor is around 5% to 10% at practical SNR for delayed CGF 
systems with i.i.d. channels. 
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Fig. 3. Average capacity lower bound Clb in dl lb vs. SNR V for delayless CGF systems (/3 = 0) with i.i.d. channels 
and different antenna sizes. The block length is L = 100. Both optimal temporal power allocation to pilot and data as well 
as equal power allocation are shown for comparison. For optimal temporal power allocation, the training length is L* = 4; 
while for equal power allocation, the pilot length is optimized numerically. 



We now consider delayed CGF systems to verify Corollary [7] Fig. 0] shows the average capacity 
lower bound Clb in CCD versus SNR V for delayed CGF systems with i.i.d. channels and different 
antenna sizes. In this example, a transmission block of length L = 100 consists of a training sub-block 
of L p = 4 symbol periods, followed by a non-adaptive data sub-block of d = 20 symbol period^! and 
an adaptive data sub-block of — d = 76 symbol periods. Therefore, the delay factor j3 = 0.208. 
The lines indicate the use of (f> = /3, and the markers indicate optimal data power division found 
through numerical optimization using Clb in CLD- The values of <fi* for SNR = 4 dB, 10 dB and 
16 dB are shown in the figure as well. We see that the capacity difference between the system using 
4> = (3 and <p = (/)* is negligible. That is to say the use of temporal equal power transmission over the 
entire data block is near optimal for systems with channel estimation errors. We have also confirmed 

5 The delay length d takes into account the channel estimation and other processing time at the receiver and transmitter, 
as well as the time spent on the transmission of low-rate feedback. 
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Fig. 4. Average capacity lower bound in i ll It vs. SNR V for delayed CGF systems with i.i.d. channels and different antenna 
sizes. Within a block length of L — 100, the training length is L p — 4, followed by a non-adaptive data transmission 
sub-block of length d — 20 and an adaptive data transmission sub-block of length 76. The lines indicate the use of 
4> = /3 — 0.208, and the markers indicate optimal data power division factor found numerically. 



that this trend is valid for a wide range of block lengths (results are omitted for brevity). These results 
validate Corollary [I] 

It is noted that we have assumed the feedback link to be noiseless. When noise is present, capacity 
that can be achieved by adaptive transmission reduces as the noise in the feedback link increases. The 
capacity reduction due to corrupted channel gain estimates was studied in [24]. It was shown that the 
capacity reduction can increase quickly with the noise in the estimated channel gains. Therefore, a 
reliable feedback scheme which minimizes the noise in the estimated channel gains is important for 
CGF systems. Furthermore, CGF systems need frequent feedback particularly when the block length 
is relatively small. This requires a significant amount of feedback overhead in the reverse link (from 
the receiver to the transmitter), which may cause a direct reduction in the overall information rate, 
especially when both the forward and the reverse links are operating at the same time, e.g., in cellular 
systems. Therefore, the CGF scheme may not be appropriate in fast fading environments where the 



block length is small. 



V. Channel Covariance Feedback (CCF) Systems 



As discussed in the previous subsection, CGF systems require frequent use of feedback due to the 
rapid change in the channel gains. On the other hand, the statistics of the channel gains change much 
slower than the channel gains themselves. As a result, it is practical for the receiver to accurately 
measure the channel covariance matrix and feed it back to the transmitter at a much lower frequency 
with negligible feedback overhead and delay. Note that for completely i.i.d. channels, there is no need 
for CCF. In this section, we consider CCF systems with spatially correlated channels and investigate 
the optimal pilot and data transmission strategy, as well as the optimal power allocation. 

A. Proposed Transmission Scheme 

Intuitively, the amount of training resource required for spatially correlated channels should be 
less than that for i.i.d. channels, as spatial correlation reduces the uncertainty in the channel gains. 
From [9], we know for i.i.d. channels that the optimal training length L* equals the number of 
transmit antennas provided that the optimal PSAM power factor a* is used. Therefore, we expect 
that L* < N t for correlated channels if we optimize a. However, most studies on the optimal pilot 
design for correlated channels assume L p > N t [15-17]. It was shown in [16] that the optimal training 
strategy is to train along the eigenvectors of the channel covariance matrix with training power being 
waterfilled according to the eigenvalues of the channel covariance matrix. Since we expect L p < N t , 
we modify the training strategy such that only the L p strongest eigen-channels are trained. 

We perform eigenvalue decomposition on Rh as Rh = UGU\ and let the eigenvalues of Rh be 
sorted in descending order in g = [g\ §2 ... gN t ] T ■ The optimal training sequence which minimizes 
the channel estimation errors {i.e., tr{R H }) has the property that the eigenvalue decomposition of 
XpXp is given by X p X p ^ = UPU ] [16], where P is a diagonal matrix. The entries of P which 
minimize the channel estimation errors follow a water-filling solution given by 



[A* - 9i i = l,...,L p , with YliZi Pi = VpL, 



(20) 



Pi 



= < 



V 







i = L p + 1, ... ,N t , 



where p is the water level and p = [pi P2 • • • PN t ] T are the eigenvalues of X p X p ^ . In practice, the 
transmitter can ensure that the number of non-zero pi equals L p by changing L p accordingly. 

For data transmission, it was shown that the optimal strategy is to transmit along the eigenvectors 
of Rh under the perfect channel estimation [4-6]. With channel estimation errors, one strategy is to 
transmit data along the eigenvectors of R H - With the proposed training sequence, it is easy to show 
from ([6]) and ([7]) that the eigenvectors of R^j and R^j are the same as those of Rh- Therefore, the 
eigenvalue decomposition of R^j can be written as R^j = UGU\ and we set Q = UQU^ where 
Q is a diagonal matrix with entries denoted by \/i = 1, Nf. 

However, there is no closed-form solution to the optimal spatial power allocation even with perfect 
channel estimation [4-6]. Following the proposed training scheme, we propose to transmit data through 
the L p trained eigen-channels with equal power. That is 

Tdl Lp, i = \,..., L p , 

(21) 

0, i = L p + l,...,N t . 
For the proposed training and data transmission scheme, the capacity lower bound per channel use 

in (O reduces to 

Clb = E Ho {log 2 \l Nt + hIh GQ(1 + p^Vd)^}, (22) 

where the (diagonal) entries of G are given by <?j = gi — p~ x , \fi = 1,...,L P and en = 0, Mi = 
L p + l,..,Nt, which is derived from ©, © and (l20l ). 

B. Optimal Temporal Power Allocation 

Now, we investigate the optimal PSAM power factor a* using the capacity lower bound given in 
(T22]) . The result is summarized in the following theorem. 

Theorem 5: For CCF systems in PSAM schemes with the transmission strategy proposed in Sec- 
tion \V-A\ the optimal PSAM power factor a* is given by M2\) with 7 = r L _^ T , provided that 

^»Ef=i5r 1 - 

Proof: see Appendix IVl 



Remark: It is noted that 7 in the optimal solution in Theorem \5\ is essentially the same as the one 
given in Section IIII-AI when VL S> 1. The condition of VL S> Yld=i can be easily satisfied 
when the block length is not too small or the SNR is moderate to high (i.e.,VL S> 1), and the spatial 
correlation between any trained channels is not close to 1 . Therefore, the result in Theorem \5\ applies 
to many practical scenarios. It is important to note that the optimal PSAM power factor a* given 
in Theorem \5\ does not depend on the channel spatial correlation, provided the condition is met. In 
other words, this unique design is suitable for a relatively wide range of channel spatial correlation. 



The following steps describe the algorithm for transmission design of CCF systems: 

1. For each L p (L p < N t ), design the pilot and data transmission according to Section IV-AI 

2. Perform temporal power allocation to pilot and data according to Section IV-BI 

3. Numerically compare the capacity lower bound in (fTTb for different L p and choose L* which 
maximizes the capacity. 



C. A Special Case: Beamforming 

Beamforming is a special case of the proposed transmission scheme where only the strongest 
eigen-channel is used, i.e.,L p = 1. The use of beamforming significantly reduces the complexity of 
the system as it allows the use of well-established scalar codec technology and only requires the 
knowledge of the strongest eigen-channel (not the complete channel statistics) [5]. For beamforming 
transmission, the capacity lower bound in (1221 reduces to 

where ho is a N r X 1 vector with i.i.d. ZMCSCG and unit variance entries, g max is the largest 
eigenvalue in g, and fi = V p + g^x which can be found by letting L p = 1 in (l20t . 

Tlieorem 6: For CCF systems in PSAM schemes with beamforming, the optimal PSAM power 

l±g SS xVL 



factor a* is given in 4721) with 7 = - — -pL(L-2)/(L-i) • 

Proof: The proof can be obtained by letting L p = 1 and ^ = g max in the proof of Theorem |3] □ 
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Fig. 5. Optimal PS AM power factor a* vs. channel spatial correlation factor p for CCF 4x4 systems with a block length 
of L = 20 and SNR = 10 dB. All values of a* are found numerically. 

Remark: It can be shown for the beamforrning case that 4^— > 0. Therefore, the optimal PSAM 
power factor a* increases as the channel spatial correlation increases, that is to say, more power 
should be allocated to data transmission when the channels become more correlated. When VL ^> 1, 
7 reduces to hence a* does not depend on the channel correlation. 

D. Numerical Results 

For numerical analysis, we choose the channel covariance matrix to be in the form of [Rn]ij = 
p\ l ~i\, where p is referred to as the spatial correlation factor [16,25]. Our numerical results validate 
the solution to the optimal PSAM power factor given in Theorem \5\ and Theorem [6] The results 
also show that optimizing the training length can significantly improve the capacity, and the simple 
transmission scheme proposed in Section IV-AI gives near optimal performance. 

Fig. [5] shows the optimal PSAM power factor a* found numerically versus the channel correlation 
factor p for CCF 4x4 systems with a block length of L = 20 and SNR of 10 dB. We see that a* 
remains constant before the correlation factor gets close to 1 for L p > 1, and this value of a* is the 
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L p = 2, spatially equal data power allocation 
L p = 2, spatially optimal data power allocation 
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Fig. 6. Average capacity lower bound Clb in i ll lb vs. channel spatial correlation factor p for CCF 2x2 systems with a 
block length of L — 20 and SNR of 10 dB. Training length of L p — 1 and L p — 2 are shown. For L p = 2, both spatial 
equal data power allocation (dashed lines) and optimal data power allocation found numerically (solid lines) are shown. 



same as the analytical value computed from Theorem\5\ For the beamforming case where L p = 1, 
we see that a* does not depend on the channel correlation, which agrees with our earlier observation 
from Theorem [6] Similar to CGF systems, we have also compared the capacity achieved using a* 
and that using equal power allocation over pilot and data, and the same trend is observed (results 
are omitted for brevity), that is, capacity gain from optimizing PSAM power factor is around 5% to 
10% at practical SNR. 

In our proposed transmission scheme for CCF systems, spatially equal power allocation is used 
for data transmission. Here we illustrate the optimality of this simple scheme in Fig. [6l which shows 
the average capacity lower bound Clb in (TTTT t versus channel correlation factor p for CCF 2x2 
systems. We compute the capacity achieved using L p = 1, and L p = 2 with spatially equal power 
allocation for data transmission (solid line) and optimal power allocation found numerically (dashed 
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Fig. 7. Average capacity lower bound Clb in i ll lb vs. channel spatial correlation factor p for CCF 4x4 systems with a 
block length of L — 20 and SNR = 10 dB. The optimal PS AM power factor a* is used in all results. 



line) for a block length of L = 20jj We also indicate the critical p at which L* changes from 2 to 1 
in Fig. [6] It is clear that the capacity loss from spatially optimal power allocation to spatially equal 
power allocation increases as p increases. At the critical p, this capacity loss is only around 1.5%. 
We also studied the results for different values of block lengths and the same trend was found (results 
are omitted for brevity). These results imply that our proposed transmission scheme is very close to 
optimal provided that the training length is optimized. 

Fig. |7] shows the average capacity lower bound Clb hi (flTT ) versus the channel correlation factor p 
for CCF 4x4 systems with a block length of L = 20 and SNR of 10 dB. The optimal PSAM power 
factor a* shown in Fig. [5] is used in the capacity computation. Comparing the capacity with different 
training lengths, we see that L* decreases as the channel becomes more correlated. More importantly, 
the capacity gain from optimizing the training length according to the channel spatial correlation can 
be significant. For example, the capacity at p = 0.5 using L p = 4 (which is optimal for i.i.d. channels) 

6 We see that the capacity increases with channel spatial correlation in the case of beamforming, while it is not monotonic 
for L p — 2. These observations were explained in [22] using Schur-convexity of capacity in the channel correlation. 



is approximately 6.3 bits per channel use, while the capacity at p = 0.5 using L* = 2 is around 7 
bits per channel use, that is to say, optimizing training length results in a capacity improvement of 
11% at p = 0.5. Moreover, the capacity improvement increases as channel correlation increases. The 
same trends are found for different values of block lengths, although the capacity improvement by 
optimizing the training length reduces as the block length increases (results are omitted for brevity). 
Therefore, it is important to numerically optimize the training length for correlated channels at small 
to moderate block lengths. 

Furthermore, one can record the range of p for each value of L* from Fig. 13 and observe the 
value of a* in the corresponding range of p in Fig. [5] It can be seen that within the range of p where 
a given L p is optimal, the value of a* for the given L p is a constant given by Theorem \5\ provided 
that VL » 1. That is to say, the condition in Theorem\5\(i.e.,VL S> X]i=i 9^ 1 ) can be simplified to 
VL S> 1 provided that the training length is optimized. 

E. Hybrid CGF and CCF Systems 

After studying the optimal transmission and power allocation strategy for CGF systems with i.i.d. 
channels and CCF system with correlated channels, we provide some discussion on systems utilizing 
both CGF and CCF with correlated channels. For spatially correlated channels, the optimal training 
follows a water-filling solution according to the channel covariance, and the optimal data transmission 
follows a water-filling solution according to the estimated channel gains. The two different water- 
filling solutions make the problem of optimizing the PSAM power factor mathematically intractable. 
Furthermore, the optimal training length L* may be smaller than the number of transmit antennas, 
and needs to be found numerically. However, from the results for CGF systems with i.i.d. channels 
in Section [IV] and CCF system with correlated channel in Section |Vj one may expect that a good 
solution for the optimal PSAM power factor a* in the hybrid system is given in Theorem \5\ 

VI. Summary of Results 

In this paper, we have studied block fading MIMO systems with feedback in PSAM transmission 
schemes. Two typical feedback systems are considered, namely the channel gain feedback and the 



channel covariance feedback systems. Using an accurate capacity lower bound as the figure of merit, 
we have provided the solutions for the optimal power allocation to training and data transmission as 
well as the optimal training length. Table U summarizes the design guidelines for both non-feedback 
systems and feedback systems. 



TABLE I 

Summary of Design Guidelines 



System 


Channel 


Design Guidelines 


Reference 


Non- 
feedback 


i.i.d. 


• Transmit orthogonal pilots among antennas with spatially equal power. 

• Transmit independent data among antennas with spatially equal power. 

• The optimal PS AM power factor a* is given by d 1 2b with 7 = -pj^x^N^/ l a ) • 

• The optimal training length V v equals the number of transmit antennas N t . 


[2,9] 


correlated 


• Use the designs for i.i.d. channels as a robust choice. 


Sec. |III-B| 


CGF 


i.i.d. 


• Transmit orthogonal pilots with spatially equal power. 

• Transmit independent data with spatially equal power in data sub-block 1 
and spatial power water- filling in data sub-block 2 (see Fig. [T}. 

• Distribute equal power per transmission throughout data sub-blocks 1 and 2. 

• a* and L* for non-feedback system are (near) optimal for (delayed) CGF system. 


Sec. |W] 


CCF 


correlated 


• For a given L p (L p < Nt), transmit pilots along the L p strongest 
eigen-channels with spatial power water- filling according to i20l. 

• Transmit data along the L p trained eigen-channels with spatially equal power. 

• a* is given by fi"2ll with 7 = l^-l ' P rov ided that VL S> Y^=i 9T 1 - 

• L* should be numerically optimized. 

• For beamforming (i.e.,L p = 1), a* is given by d 1 2b with 7 = - — p'^""^ 1) ■ 


Sec.|V] 



Appendix I 

A Measure of Channel Spatial Correlation 
A vector a = [ai 02 ... a n ] T is said to be majorized by another vector b = [b± 62 • • • b n ] T if 

k k n n 

^2ai<^2h, fc = l,...,n-l, and ^Oi=^&i, (24) 

i=l i=l i=l i=l 



where the elements in both vectors are sorted in descending order [26]. We denote the relationship 
as a -< b. Any real-valued function <£, defined on a vector subspace, is said to be Schur-convex, if 
a -< b implies 3? (a) < $(b) [26]. Similarly $ is Schur-concave, if a ~< b implies 3>(a) > $>(b). 
Following [20], we have the following definition: 

Definition 1: Let a contain the eigenvalues of a channel covariance matrix such as R a , and b 
contain the eigenvalues of another channel covariance matrix R^. The elements in both vectors are 
sorted in descending order. Then R a is less correlated than Rt, if and only if a -< b. 

Appendix II 
Proof of Theorem 1 

This is a max-min problem where the MSE of the channel estimates is to be minimized by 
XpXp^ and to be maximized by Rh- We need to show that infx x + su Pi?jf tr {^/f} i s achieved 
by orthogonal pilot sequence with equal power allocated among the transmit antennas, i.e., X p X p ^ = 
?j^I Nt , assuming L p > N t . 

From © we see that 

N t 

sup^tr{^} > ti{(I Nt +X p Xjr 1 } = ^2(l+p i r\ (25) 

i=l 

where p = \pi P2 ■ ■ ■ PN t ] T are the eigenvalues of X p X p ^. Since the sum of a convex function of 
Pi is Schur-convex in p [26], we conclude that (1251 ) is Schur-convex in p. Since tr{X p X p J } = V p L p , 
we have 

sup H „tr{i^} > ^(l + ^n) , (26) 

i=l ' 

where we have used X p X p ^ = ^ p Jjy t . Note that (l26l ) holds for any X p X p K On the other hand 
inf x p x p t sup Hfj triRfj} < sup^trj^^ + ^^J^) }, 



i=l 

N t 



N t 

N t 



i=i 1 




Fig. 8. A sketch example of p c ff v.s. a. The vertical dashed lines indicates the values of a at which m changes its value, 
ai, 012, Q3 and Q4 indicate the local optimal values of a which gives local maximal p e s. 



-l , r„L r 



where (1271) is obtained using the Schur-concavity of J2i=i \ 9i + ~iv7 
we conclude that 



in g. From 



and 



inf x p x p t su Pj?h trji?^} = + 



i=l 



which can be achieved by X p X p J = jy P Jjv t - D 



With (7^ = (l + ^ 



Appendix III 
Proof of Theorem 3 



, = 1 — a 2 ~ and (J2]), it can be shown that o e g = 1 ^ 1 is a 



concave function of a G [0, 1]. Also, m is discrete and non-decreasing on a € [0, 1] as the number 
of active eigen-channel cannot decrease as the data transmission power increases. Here we show a 
sketch plot of p e s versus a in Fig. [8] to visualize the proof. From (fT6l ) we see that Clb is maximized 
when p et f reaches its maximum for any fixed m. Therefore, we will have a\, a\, a\ and a\ as the 
local optimal points in Fig. [8] which maximize Clb in corresponding regions of a. From the property 
of water-filling solution in (fl4b . we know that qi is continuous on Vd and hence, is continuous on 
a G [0, 1]. Therefore, Clb in (fl3T ) is continuous on a G [0, 1]. This implies that Clb is continuous 
across the boundaries of different regions of a, indicated by the dashed lines in Fig. [8] Consequently, 



the global optimal point 0:3 = a* which maximizes p e ff in Fig. [8] is also the global optimal point 
which maximizes Clb- It is noted that the objective function p 6 s is the same as that in non-feedback 
systems given in [9]. Therefore, the solution of a* coincides with the solution for non-feedback 
systems given in (PT21 . □ 

Appendix IV 
Proof of Theorem 4 

We let peff — Y-y^-Vd ' y = ^i=i ^ n m' an( * z = Tli^i Then the average capacity lower bound 
in (TTTT) can be rewritten using (fT6l ) as 

Clb = ^ L T ^—E x {mln(p eS + z)+y}. 
L ml 

Differentiating Clb w.r.t, L& for any fixed m gives 

dC LB 1 m( r Lrf dpetf y \\ 

—77 — = r^T ln Arff + ^0 + — -77- + — \ • (28) 

Similar to [9], we need to show that d ^ B > 0. It can be shown that Clb is continuous on L d 
(treating Ld as a positive real-valued variable) regardless the value of to. Therefore, the value of to 
does not cause any problem in the proof. 

Here we consider the case where Ld > N t and omit the cases Ld = N t and Ld < N t which can be 
handled similarly. Taking the derivative of p e s w.r.t., Ld with some algebraic manipulation, we have 



dpeff _ Peff ( 1 _ N t (N t + VL) \ (29) 



dL d L d -N t \ V L d( L d + VL) 

Substituting (|29]> into d28i we get 



I VP ' ta + z^-JV, \ LALd + VL) ml 



J5 



dL d ln2L\ v "l vrcu y PeS + zL d -N t \ \ L d (L d + VL) ) m. 
With Ld > N t , it can be shown that 



L d / N t (N t + VL) 



L d -N t y \/ L d (L d + VL) 
Therefore, it suffices to show that 



< 1. 



E X { ln(p eff + z)- + ^ ) > 0. (30) 

Pes + z to J 



Furthermore, one can show that 



d TP f W , \ Peff y 1 Peff . n 
E X \ ln(peff + Z) — + — \ = —-2 > 



for any fixed m. Therefore, we only need to show (1301) holds at p e g = 0, that is 

m m , 

Ej\nz+y-\ = J5J]n£Ar 1 + -I>-}' 
I mJ I z — ' m ^-^ mi 

i=l i=l 

Im^ m m ' mJ 

i=l i=l 

= sJ]n^} = 0, 
I m mi 



(31) 



where (f3TT > is obtained using the concavity of fn(-). Therefore, we conclude that ( ^ LB > 0, which 
implies the training length should be kept minimum, i.e.,L* = N t . □ 

Appendix V 
Proof of Theorem 5 

For any positive definite matrix A, log 2 \A\ is increasing in A [26]. Also, for any positive semi- 
defmite matrix B, I+H^HqB is a positive definite matrix [2]. Since GQ(1+ p^Vd)^ 1 is a positive 



semi-definite matrix, the capacity lower bound in (1221) is maximized when the diagonal entries of 
GQ{\ + p~ l Vd)~ 1 are maximized. 

The ith non-zero diagonal entry of GQ(1 + p~ 1 Vd)~ 1 is given by 

(gi - p' x )V d 9i VpTd + Vdiy-gj 1 ) 

{l + ^V d )L p L p V p + V d + y ' ^ 
where we have used (1201 and let y = p — V p = j-- X^=i ■ Substituting a from © into (l32l) with 
some algebraic manipulation, we get 

g(PL a(l -a) + a^(y - g^ 1 ) 
PeS ' 1 ~ L„(L d — L p ) _ a+ -PL+L p y ■ 

PV a PJ " ^ VL(1-L p /L d ) 

Here we consider the case where Ld > L p and omit the cases Ld = L p and L d < L p which can be 
handled similarly. It can be shown that p e ^ t i in (l33l) is concave in a G (0, 1). Therefore, the optimal 
a occurs at dp ^' = 0, which is the root to a 2 — 2ay + 7 + -yz = 0, where 7 = vL(i-L P fL d ) and 
z = ^(y — g^ 1 )- It is clear that a depends on gi through z. Therefore, there is no unique a which 



maximizes all p e ff,v However, this dependence disappears when VL S> L p y = X)j=i 9i ■ Under this 
condition, one can show that 7 L ^J L and 2 ps 0. And there exists a unique solution of a* which 
maximizes all the diagonal entries of GQ{1 + ^^Vd)' 1 , given by 

a* = 7 — Vt(7 — 1), where 7 = ^ d . □ 
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